Extraction of objects and page segmentation of composite documents with non-uniform background
نویسندگان
چکیده
In designing page segmentation systems for documents with complex background and poor illumination, separating the background from the objects (text and images) is very crucial for the success of such system. The new local based neural binarization technique developed by the authors will be used to extract the objects from document images with complex backgrounds. This algorithm uses statistical and textural feature measures to obtain a feature vector for each pixel from a window of size ) 1 2 ( ) 1 2 ( + × + n n , where 1 ≥ n . These features provide a local understanding of pixels from their neighbourhoods making it easier to classify each pixel into its proper class. A Multi-Layer Perceptron Neural Network (MLP NN) is then used to classify each pixel in the image. The results of thresholding are then passed to a block segmentation stage. The block segmentation technique developed is a feature-based method that uses a Neural Network classifier to automatically segment and classify the image contents into text and halftone images. The results of page segmentation are then ready to be passed into an OCR system that will convert the text image into a format the can be stored and modified.
منابع مشابه
Persian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملSegmentation Improvement of High Resolution Remote Sensing Images based on superpixels using Edge-based SLIC algorithm (E-SLIC)
The segmentation of high resolution remote sensing images is one of the most important analyses that play a significant role in the maximal and exact extraction of information. There are different types of segmentation methods among which using superpixels is one of the most important ones. Several methods have been proposed for extracting superpixels. Among the most successful ones, we can r...
متن کاملExtracting Vessel Centerlines From Retinal Images Using Topographical Properties and Directional Filters
In this paper we consider the problem of blood vessel segmentation in retinal images. After enhancing the retinal image we use green channel of images for segmentation as it provides better discrimination between vessels and background. We consider the negative of retinal green channel image as a topographical surface and extract ridge points on this surface. The points with this property are l...
متن کاملObject-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کامل